A Parameterless Method for Efficiently Discovering Clusters of Arbitrary Shape in Large Datasets
نویسندگان
چکیده
Clustering is the problem of grouping data based on similarity and consists of maximizing the intra-group similarity while minimizing the inter-group similarity. The problem of clustering data sets is also known as unsupervised classification, since no class labels are given. However, all existing clustering algorithms require some parameters to steer the clustering process, such as the famous for the number of expected clusters, which constitutes a supervision of a sort. We present in this paper a new, efficient, fast and scalable clustering algorithm that clusters over a range of resolutions and finds a potential optimum clustering without requiring any parameter input. Our experiments show that our algorithm outperforms most existing clustering algorithms in quality and speed for large data sets.
منابع مشابه
A Neighborhood-Based Clustering Algorithm
In this paper, we present a new clustering algorithm, NBC, i.e., Neighborhood Based Clustering, which discovers clusters based on the neighborhood characteristics of data. The NBC algorithm has the following advantages: (1) NBC is effective in discovering clusters of arbitrary shape and different densities; (2) NBC needs fewer input parameters than the existing clustering algorithms; (3) NBC ca...
متن کاملA Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algo...
متن کاملAutomatically finding clusters in normalized cuts
Normalized Cuts is a state-of-the-art spectral method for clustering. By applying spectral techniques, the data becomes easier to cluster and then k-means is classically used. Unfortunately the number of clusters must be manually set and it is very sensitive to initialization. Moreover, k-means tends to split large clusters, to merge small clusters, and to favor convex-shaped clusters. In this ...
متن کاملABACUS: Mining Arbitrary Shaped Clusters from Large Datasets based on Backbone Identification
A wide variety of clustering algorithms exist that cater to applications based on certain special characteristics of the data. Our focus is on methods that capture arbitrary shaped clusters in data, the so called spatial clustering algorithms. With the growing size of spatial datasets from diverse sources, the need for scalable algorithms is paramount. We propose a shape-based clustering algori...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کامل